{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "ur8xi4C7S06n"
   },
   "outputs": [],
   "source": [
    "# Copyright 2021 Google LLC\n",
    "#\n",
    "# Licensed under the Apache License, Version 2.0 (the \"License\");\n",
    "# you may not use this file except in compliance with the License.\n",
    "# You may obtain a copy of the License at\n",
    "#\n",
    "#     https://www.apache.org/licenses/LICENSE-2.0\n",
    "#\n",
    "# Unless required by applicable law or agreed to in writing, software\n",
    "# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
    "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
    "# See the License for the specific language governing permissions and\n",
    "# limitations under the License."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "tvgnzT1CKxrO"
   },
   "source": [
    "### Overview\n",
    "\n",
    "In this notebook, you'll learn how to submit a job to the Vertex AI training service. In the job you'll train your TensorFlow 2 model and export the saved model to Cloud Storage.\n",
    "\n",
    "### Dataset\n",
    "\n",
    "[CTA - Ridership - Daily Boarding Totals](https://data.cityofchicago.org/Transportation/CTA-Ridership-Daily-Boarding-Totals/6iiy-9s97): This dataset shows systemwide boardings for both bus and rail services provided by Chicago Transit Authority, dating back to 2001.\n",
    "\n",
    "### Objective\n",
    "\n",
    "The goal is to forecast future transit ridership in the City of Chicago, based on previous ridership."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "i7EUnXsZhAGF"
   },
   "source": [
    "## Install packages and dependencies"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "XoEqT2Y4DJmf"
   },
   "source": [
    "### Import libraries and define constants"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 161
    },
    "colab_type": "code",
    "id": "kRv5imUnKkuP",
    "outputId": "f2f8b527-1b8b-4d7b-c69d-dc9426f62cf6"
   },
   "outputs": [],
   "source": [
    "import datetime\n",
    "import os\n",
    "import time\n",
    "\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "import tensorflow as tf\n",
    "\n",
    "from google.cloud import aiplatform, storage\n",
    "from google.cloud.aiplatform import gapic as aip\n",
    "from sklearn.preprocessing import StandardScaler"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Check the TensorFlow version installed\n",
    "\n",
    "tf.__version__"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "oM1iC_MfAts1"
   },
   "outputs": [],
   "source": [
    "# Enter your project, region, and a bucket name. Then run the  cell to make sure the\n",
    "# Cloud SDK uses the right project for all the commands in this notebook.\n",
    "\n",
    "PROJECT = 'your-project-name' # REPLACE WITH YOUR PROJECT ID\n",
    "BUCKET = 'your-regional-bucket' # REPLACE WITH A UNIQUE REGIONAL BUCKET NAME e.g. your PROJECT NAME\n",
    "REGION = 'us-central1' # REPLACE WITH YOUR BUCKET REGION e.g. us-central1\n",
    "BUCKET_URI = 'gs://' + BUCKET\n",
    "\n",
    "#Don't change the following command - this is to check if you have changed the project name above.\n",
    "assert PROJECT != 'your-project-name', 'Don''t forget to change the project variables!'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Initialize the Vertex SDK for Python\n",
    "\n",
    "aiplatform.init(project=PROJECT, location=REGION, staging_bucket=BUCKET)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Dataset parameters\n",
    "\n",
    "target_col = 'total_rides' # The variable you are predicting\n",
    "ts_col = 'service_date' # The name of the column with the date field"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Model parameters\n",
    "\n",
    "freq = 'D' # Daily frequency\n",
    "n_input_steps = 30 # Lookback window\n",
    "n_output_steps = 7 # How many steps to predict forward\n",
    "n_seasons = 7 # Monthly periodicity\n",
    "\n",
    "train_split = 0.8 # % Split between train/test data\n",
    "epochs = 1000 # How many passes through the data (early-stopping will cause training to stop before this)\n",
    "patience = 5 # Terminate training after the validation loss does not decrease after this many epochs\n",
    "\n",
    "lstm_units = 64\n",
    "input_layer_name = 'lstm_input'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Training parameters\n",
    "\n",
    "MODEL_NAME = 'cta_ridership'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "zgPO1eR3CYjk"
   },
   "source": [
    "### Create a Cloud Storage bucket\n",
    "\n",
    "**The following steps are required, regardless of your notebook environment.**\n",
    "\n",
    "When you submit a training job using the Cloud SDK, you upload a Python package\n",
    "containing your training code to a Cloud Storage bucket. AI Platform runs\n",
    "the code from this package. In this tutorial, AI Platform also saves the\n",
    "trained model that results from your job in the same bucket. You can then\n",
    "create an AI Platform model version based on this output in order to serve\n",
    "online predictions."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "cellView": "both",
    "colab": {},
    "colab_type": "code",
    "id": "MzGDU7TWdts_"
   },
   "outputs": [],
   "source": [
    "storage_client = storage.Client()\n",
    "try:\n",
    "    bucket = storage_client.get_bucket(BUCKET)\n",
    "    print('Bucket exists, let''s not recreate it.')\n",
    "except:\n",
    "    bucket = storage_client.create_bucket(BUCKET)\n",
    "    print('Created bucket: ' + BUCKET)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "M8_SY9abGxCc"
   },
   "source": [
    "## Load and preview the data\n",
    "\n",
    "Pre-processing on the original dataset has been done for you and made available on Cloud Storage."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "processed_file = 'cta_ridership.csv' # Which file to save the results to\n",
    "\n",
    "if os.path.exists(processed_file):\n",
    "    input_file = processed_file # File created in previous lab\n",
    "else:\n",
    "    input_file = f'data/{processed_file}'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "JOfBsktiGCOp"
   },
   "outputs": [],
   "source": [
    "df = pd.read_csv(input_file, index_col=ts_col, parse_dates=True)\n",
    "\n",
    "# Plot 30 days of ridership \n",
    "_ = df[target_col][:30].plot()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Define some characteristics of the data that will be used later\n",
    "n_features = len(df.columns)\n",
    "\n",
    "# Index of target column. Used later when creating dataframes.\n",
    "target_col_num = df.columns.get_loc(target_col)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "067UQKwlVBUf"
   },
   "source": [
    "### Process data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 227
    },
    "colab_type": "code",
    "id": "59PwFlYDU13-",
    "outputId": "b4b59f59-6d39-4235-8f9f-8a73d7e1cd0e"
   },
   "outputs": [],
   "source": [
    "# Split data\n",
    "\n",
    "size = int(len(df) * train_split)\n",
    "df_train, df_test = df[0:size].copy(deep=True), df[size:len(df)].copy(deep=True)\n",
    "\n",
    "df_train.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 307
    },
    "colab_type": "code",
    "id": "Yn9RSS3DVEtt",
    "outputId": "cd066456-b419-4e0f-8c52-2fd7d82123ba"
   },
   "outputs": [],
   "source": [
    "_ = df_train.plot()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "jlgFdwM9VOnd"
   },
   "source": [
    "### Scale values"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Review original values\n",
    "\n",
    "df_train.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "xqRM0Wt6VKzm"
   },
   "outputs": [],
   "source": [
    "# For neural networks to converge quicker, it is helpful to scale the values.\n",
    "# For example, each feature might be transformed to have a mean of 0 and std. dev. of 1.\n",
    "#\n",
    "# You are working with a mix of features, input timesteps, output horizon, etc.\n",
    "# which don't work out-of-the-box with common scaling utilities.\n",
    "# So, here are a couple wrappers to handle scaling and inverting the scaling.\n",
    "\n",
    "feature_scaler = StandardScaler()\n",
    "target_scaler = StandardScaler()\n",
    "\n",
    "def scale(df, \n",
    "          fit=True, \n",
    "          target_col=target_col,\n",
    "          feature_scaler=feature_scaler,\n",
    "          target_scaler=target_scaler):\n",
    "    \"\"\"\n",
    "    Scale the input features, using a separate scaler for the target.\n",
    "    \n",
    "    Parameters: \n",
    "    df (pd.DataFrame): Input dataframe\n",
    "    fit (bool): Whether to fit the scaler to the data (only apply to training data)\n",
    "    target_col (pd.Series): The column that is being predicted\n",
    "    feature_scaler (StandardScaler): Scaler used for features\n",
    "    target_scaler (StandardScaler): Scaler used for target\n",
    "      \n",
    "    Returns: \n",
    "    df_scaled (pd.DataFrame): Scaled dataframe   \n",
    "    \"\"\"    \n",
    "    \n",
    "    target = df[target_col].values.reshape(-1, 1)\n",
    "    if fit:\n",
    "        target_scaler.fit(target)\n",
    "    target_scaled = target_scaler.transform(target)\n",
    "    \n",
    "    # Select all columns other than target to be features\n",
    "    features = df.loc[:, df.columns != target_col].values\n",
    "    \n",
    "    if features.shape[1]:  # If there are any features\n",
    "        if fit:\n",
    "            feature_scaler.fit(features)\n",
    "        features_scaled = feature_scaler.transform(features)\n",
    "        \n",
    "        # Combine target and features into one data frame\n",
    "        df_scaled = pd.DataFrame(features_scaled)\n",
    "        target_col_num = df.columns.get_loc(target_col)\n",
    "        df_scaled.insert(target_col_num, target_col, target_scaled)\n",
    "        df_scaled.columns = df.columns        \n",
    "    \n",
    "    else:  # If only target column (no additional features)\n",
    "        df_scaled = pd.DataFrame(target_scaled, columns=df.columns)\n",
    "      \n",
    "    return df_scaled\n",
    "\n",
    "def inverse_scale(data, target_scaler=target_scaler):\n",
    "    \"\"\"\n",
    "    Transform the scaled values of the target back into their original form.\n",
    "    The features are left alone, as we're assuming that the output of the model only includes the target.\n",
    "    \n",
    "    Parameters: \n",
    "    data (np.array): Input array\n",
    "    target_scaler (StandardScaler): Scaler used for target\n",
    "      \n",
    "    Returns: \n",
    "    data_scaled (np.array): Scaled array   \n",
    "    \"\"\"    \n",
    "    \n",
    "    df = pd.DataFrame()\n",
    "    data_scaled = np.empty([data.shape[1], data.shape[0]])\n",
    "    for i in range(data.shape[1]):\n",
    "        data_scaled[i] = target_scaler.inverse_transform([data[:,i]])\n",
    "    return data_scaled.transpose()\n",
    "\n",
    "df_train_scaled=scale(df_train)\n",
    "df_test_scaled=scale(df_test, False)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 197
    },
    "colab_type": "code",
    "id": "4w2K9hVHXV6-",
    "outputId": "b48986d7-e9b4-4a63-c71c-54b43b1ea7ad"
   },
   "outputs": [],
   "source": [
    "# Review scaled values\n",
    "\n",
    "df_train_scaled.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "CNw7mVozbsA2"
   },
   "source": [
    "### Create sequences of time series data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "76Wp5gRtbrA9"
   },
   "outputs": [],
   "source": [
    "def reframe(data, n_input_steps = n_input_steps, n_output_steps = n_output_steps, target_col = target_col):\n",
    "\n",
    "    target_col_num = data.columns.get_loc(target_col)    \n",
    "    \n",
    "    # Iterate through data and create sequences of features and outputs\n",
    "    df = pd.DataFrame(data)\n",
    "    cols=list()\n",
    "    for i in range(n_input_steps, 0, -1):\n",
    "        cols.append(df.shift(i))\n",
    "    for i in range(0, n_output_steps):\n",
    "        cols.append(df.shift(-i))\n",
    "        \n",
    "    # Concatenate values and remove any missing values\n",
    "    df = pd.concat(cols, axis=1)\n",
    "    df.dropna(inplace=True)\n",
    "    \n",
    "    # Split the data into feature and target variables\n",
    "    n_feature_cols = n_input_steps * n_features\n",
    "    features = df.iloc[:,0:n_feature_cols]\n",
    "    target_cols = [i for i in range(n_feature_cols + target_col_num, n_feature_cols + n_output_steps * n_features, n_features)]\n",
    "    targets = df.iloc[:,target_cols]\n",
    "\n",
    "    return (features, targets)\n",
    "\n",
    "X_train_reframed, y_train_reframed = reframe(df_train_scaled)\n",
    "X_test_reframed, y_test_reframed = reframe(df_test_scaled)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "tqhwqDsxb-E_"
   },
   "source": [
    "## Build a model and submit your training job to AI Platform\n",
    "\n",
    "The model you're building here trains pretty fast so you could train it in this notebook, but for more computationally expensive models, it's useful to train them in the Cloud. To use AI Platform Training, you'll package up your training code and submit a training job to the AI Platform Prediction service.\n",
    "\n",
    "In your training script, you'll also export your trained `SavedModel` to a Cloud Storage bucket."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Prepare test data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "HG1bYOO8b4sN"
   },
   "outputs": [],
   "source": [
    "# Reshape test data to match model inputs and outputs\n",
    "\n",
    "X_train = X_train_reframed.values.reshape(-1, n_input_steps, n_features)\n",
    "X_test = X_test_reframed.values.reshape(-1, n_input_steps, n_features)\n",
    "y_train = y_train_reframed.values.reshape(-1, n_output_steps)\n",
    "y_test = y_test_reframed.values.reshape(-1, n_output_steps)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Specify directories to be used later\n",
    "\n",
    "TRAINER_DIR = 'trainer'\n",
    "EXPORT_DIR = 'tf_export'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Create trainer directory if it doesn't already exist\n",
    "\n",
    "!mkdir $TRAINER_DIR"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Copy numpy arrays to npy files\n",
    "\n",
    "np.save(TRAINER_DIR + '/x_train.npy', X_train)\n",
    "np.save(TRAINER_DIR + '/x_test.npy', X_test)\n",
    "np.save(TRAINER_DIR + '/y_train.npy', y_train)\n",
    "np.save(TRAINER_DIR + '/y_test.npy', y_test)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Prepare model code"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Write training code out to a file that will be submitted to the training job\n",
    "# Note: f-strings are supported in Python 3.6 and above\n",
    "\n",
    "model_template = f\"\"\"import argparse\n",
    "import numpy as np\n",
    "import os\n",
    "import tempfile\n",
    "\n",
    "from google.cloud import storage\n",
    "from tensorflow import keras\n",
    "from tensorflow.keras import Sequential\n",
    "from tensorflow.keras.layers import Dense, LSTM\n",
    "from tensorflow.keras.callbacks import EarlyStopping\n",
    "\n",
    "n_features = {n_features} # Two features: y (previous values) and whether the date is a holiday\n",
    "n_input_steps = {n_input_steps} # Lookback window\n",
    "n_output_steps = {n_output_steps} # How many steps to predict forward\n",
    "\n",
    "epochs = {epochs} # How many passes through the data (early-stopping will cause training to stop before this)\n",
    "patience = {patience} # Terminate training after the validation loss does not decrease after this many epochs\n",
    "\n",
    "def download_blob(bucket_name, source_blob_name, destination_file_name):\n",
    "    '''Downloads a blob from the bucket.'''\n",
    "    # bucket_name = \"your-bucket-name\"\n",
    "    # source_blob_name = \"storage-object-name\"\n",
    "    # destination_file_name = \"local/path/to/file\"\n",
    "\n",
    "    storage_client = storage.Client()\n",
    "\n",
    "    bucket = storage_client.bucket(bucket_name)\n",
    "\n",
    "    # Construct a client side representation of a blob.\n",
    "    # Note `Bucket.blob` differs from `Bucket.get_blob` as it doesn't retrieve\n",
    "    # any content from Google Cloud Storage. As we don't need additional data,\n",
    "    # using `Bucket.blob` is preferred here.\n",
    "    blob = bucket.blob(source_blob_name)\n",
    "    blob.download_to_filename(destination_file_name)\n",
    "\n",
    "    print(\"Blob \" + source_blob_name + \" downloaded to \" + destination_file_name + \".\")\n",
    "\n",
    "def extract_bucket_and_prefix_from_gcs_path(gcs_path: str):\n",
    "    '''Given a complete GCS path, return the bucket name and prefix as a tuple.\n",
    "\n",
    "    Example Usage:\n",
    "\n",
    "        bucket, prefix = extract_bucket_and_prefix_from_gcs_path(\n",
    "            \"gs://example-bucket/path/to/folder\"\n",
    "        )\n",
    "\n",
    "        # bucket = \"example-bucket\"\n",
    "        # prefix = \"path/to/folder\"\n",
    "\n",
    "    Args:\n",
    "        gcs_path (str):\n",
    "            Required. A full path to a Google Cloud Storage folder or resource.\n",
    "            Can optionally include \"gs://\" prefix or end in a trailing slash \"/\".\n",
    "\n",
    "    Returns:\n",
    "        Tuple[str, Optional[str]]\n",
    "            A (bucket, prefix) pair from provided GCS path. If a prefix is not\n",
    "            present, a None will be returned in its place.\n",
    "    '''\n",
    "    if gcs_path.startswith(\"gs://\"):\n",
    "        gcs_path = gcs_path[5:]\n",
    "    if gcs_path.endswith(\"/\"):\n",
    "        gcs_path = gcs_path[:-1]\n",
    "\n",
    "    gcs_parts = gcs_path.split(\"/\", 1)\n",
    "    gcs_bucket = gcs_parts[0]\n",
    "    gcs_blob_prefix = None if len(gcs_parts) == 1 else gcs_parts[1]\n",
    "\n",
    "    return (gcs_bucket, gcs_blob_prefix)\n",
    "\n",
    "def get_args():\n",
    "    parser = argparse.ArgumentParser()\n",
    "    parser.add_argument(\n",
    "        '--data-uri',\n",
    "        default=None,\n",
    "        help='URL where the training files are located')\n",
    "    args = parser.parse_args()\n",
    "    print(args)\n",
    "    return args\n",
    "\n",
    "def main():\n",
    "    args = get_args()\n",
    "    bucket_name, blob_prefix = extract_bucket_and_prefix_from_gcs_path(args.data_uri)\n",
    "    \n",
    "    # Get the training data and convert back to np arrays\n",
    "    local_data_dir = os.path.join(os.getcwd(), tempfile.gettempdir())\n",
    "    files = ['x_train.npy', 'y_train.npy', 'x_test.npy', 'y_test.npy']\n",
    " \n",
    "    for file in files:\n",
    "        download_blob(bucket_name, os.path.join(blob_prefix,file), os.path.join(local_data_dir,file))\n",
    "\n",
    "    X_train = np.load(local_data_dir + '/x_train.npy')\n",
    "    y_train = np.load(local_data_dir + '/y_train.npy')\n",
    "    X_test = np.load(local_data_dir + '/x_test.npy')\n",
    "    y_test = np.load(local_data_dir + '/y_test.npy')\n",
    "        \n",
    "    # Build and train the model\n",
    "    model = Sequential([\n",
    "        LSTM({lstm_units}, input_shape=[n_input_steps, n_features], recurrent_activation=None),\n",
    "        Dense(n_output_steps)])\n",
    "\n",
    "    model.compile(optimizer='adam', loss='mae')\n",
    "\n",
    "    early_stopping = EarlyStopping(monitor='val_loss', patience=patience)\n",
    "    _ = model.fit(x=X_train, y=y_train, validation_data=(X_test, y_test), epochs=epochs, callbacks=[early_stopping])\n",
    "    \n",
    "    # Export the model\n",
    "    model.save(os.environ[\"AIP_MODEL_DIR\"])\n",
    "    \n",
    "if __name__ == '__main__':\n",
    "    main()\n",
    "\"\"\"\n",
    "\n",
    "with open(os.path.join(TRAINER_DIR, 'task.py'), 'w') as f:\n",
    "    f.write(model_template.format(**globals()))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Copy the data files to a GCS bucket\n",
    "\n",
    "!gcloud storage cp --recursive trainer/*.npy $BUCKET_URI/$TRAINER_DIR"   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# List the contents of the bucket to ensure they were copied properly\n",
    "\n",
    "!gcloud storage ls $BUCKET_URI/$TRAINER_DIR"   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Submit training job"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Set training job parameters\n",
    "\n",
    "CMDARGS = [\n",
    "    f\"--data-uri={BUCKET_URI}/{TRAINER_DIR}\"\n",
    "]\n",
    "TRAIN_VERSION = \"tf-cpu.2-6\"\n",
    "DEPLOY_VERSION = \"tf2-cpu.2-6\"\n",
    "\n",
    "TRAIN_IMAGE = \"us-docker.pkg.dev/vertex-ai/training/{}:latest\".format(TRAIN_VERSION)\n",
    "DEPLOY_IMAGE = \"us-docker.pkg.dev/vertex-ai/prediction/{}:latest\".format(DEPLOY_VERSION)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Re-run these additional parameters if you need to create a new training job\n",
    "\n",
    "TIMESTAMP = str(datetime.datetime.now().time())\n",
    "JOB_NAME = 'vertex_ai_training_' + TIMESTAMP\n",
    "MODEL_DISPLAY_NAME = MODEL_NAME + TIMESTAMP"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Create and run the training job\n",
    "\n",
    "job = aiplatform.CustomTrainingJob(\n",
    "    display_name=JOB_NAME,\n",
    "    script_path=f\"{TRAINER_DIR}/task.py\",\n",
    "    container_uri=TRAIN_IMAGE,\n",
    "    model_serving_container_image_uri=DEPLOY_IMAGE,\n",
    ")\n",
    "\n",
    "model = job.run(\n",
    "        model_display_name=MODEL_DISPLAY_NAME,\n",
    "        args=CMDARGS,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Deploy the model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "DEPLOYED_NAME = f\"{MODEL_NAME}_deployed-\" + TIMESTAMP\n",
    "\n",
    "endpoint = model.deploy(\n",
    "    deployed_model_display_name=DEPLOYED_NAME,\n",
    "    machine_type=\"n1-standard-4\",\n",
    "    min_replica_count=1,\n",
    "    max_replica_count=1,\n",
    "    traffic_split={\"0\": 100},\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "ZFglcq4kcd4R"
   },
   "source": [
    "## Get predictions on deployed model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Get predictions for the first test instance\n",
    "\n",
    "raw_predictions = endpoint.predict(instances=X_test.tolist()).predictions[0]\n",
    "predicted_values = inverse_scale(np.array([raw_predictions])).round()\n",
    "\n",
    "actual_values = inverse_scale(np.array([y_test[0]]))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Print prediction and compare to actual value\n",
    "\n",
    "print('Predicted riders:', predicted_values)\n",
    "print('Actual riders:   ', actual_values)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Cleanup"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "delete_training_job = True\n",
    "delete_model = True\n",
    "delete_endpoint = True\n",
    "\n",
    "# Warning: Setting this to true will delete everything in your bucket\n",
    "delete_bucket = False\n",
    "\n",
    "# Delete the training job\n",
    "job.delete()\n",
    "\n",
    "# Delete the endpoint\n",
    "endpoint.delete(force=True)\n",
    "\n",
    "# Delete the model\n",
    "model.delete()\n",
    "\n",
    "# Warning: uncomment this section only if you want to delete the entire bucket\n",
    "# if delete_bucket and \"BUCKET\" in globals():\n",
    "#     ! gcloud storage rm --recursive $BUCKET"   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Conclusion"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this section, you've learned how to:\n",
    "* Prepare data and models for training in the cloud\n",
    "* Train your model and monitor the progress of the job with AI Platform Training\n",
    "* Predict using the model with AI Platform Predictions"
   ]
  }
 ],
 "metadata": {
  "colab": {
   "collapsed_sections": [],
   "name": "liquor-sales-xai.ipynb",
   "provenance": []
  },
  "environment": {
   "kernel": "python3",
   "name": "tf2-gpu.2-6.m82",
   "type": "gcloud",
   "uri": "gcr.io/deeplearning-platform-release/tf2-gpu.2-6:m82"
  },
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
